# Efficient vision-language model
Omnigen2
Apache-2.0
OmniGen2 is a powerful and efficient unified multimodal model composed of a 3B vision-language model and a 4B diffusion model, supporting visual understanding, text-to-image generation, instruction-guided image editing, and context generation.
Text-to-Image
O
OmniGen2
136
5
Mobileclip B LT OpenCLIP
MobileCLIP-B (LT) is an efficient image-text model developed by Apple, achieving fast zero-shot image classification through multimodal reinforcement training, outperforming similar models.
Text-to-Image
M
apple
774
9
Mobilevlm 1.7B
Apache-2.0
MobileVLM is a lightweight multi-modal vision-language model designed specifically for mobile devices, supporting efficient image understanding and text generation tasks.
Text-to-Image
Transformers

M
mtgv
647
15
Featured Recommended AI Models